Dried fish dataset for Indian seafood: A machine learning application

Dryingfish is a simple and economical way to process the catch. It creates a profitable business for coastal communities by providing a market for their catches, even during periods of abundance. It's a traditional method to preserve fish, especially valuable in regions where fresh fish isn't readily available or affordable throughout the year. This dataset provides a rich resource of 8290 images specifically designed for machine learning applications. It focuses on the five most popular types of dried seafood in India: prawns (shrimp), small anchovies (tingali), golden anchovies (mandeli), mackerel (bangada), and Bombay duck (bombil). To ensure high-quality data for machine learning applications for Identification and classification of different dried fish varieties, the dataset features a diverse set of images in singles and in bulk for each category. The dataset utilizes standardized lighting, background, and object pose for optimal machine learning performance. This rich dataset empowers researchers and data scientists to leverage machine learning for various applications in the Indian dried fish industry.Overall, the Dried Fish Dataset for Indian Seafood aims to leverage machine learning to improve the standardization, quality control, safety, and efficiency of the Indian dried fish industry.


a b s t r a c t
Dryingfish is a simple and economical way to process the catch.It creates a profitable business for coastal communities by providing a market for their catches, even during periods of abundance.It's a traditional method to preserve fish, especially valuable in regions where fresh fish isn't readily available or affordable throughout the year.This dataset provides a rich resource of 8290 images specifically designed for machine learning applications.It focuses on the five most popular types of dried seafood in India: prawns (shrimp), small anchovies (tingali), golden anchovies (mandeli), mackerel (bangada), and Bombay duck (bombil).To ensure high-quality data for machine learning applications for Identification and classification of different dried fish varieties, the dataset features a diverse set of images in singles and in bulk for each category.The dataset utilizes standardized lighting, background, and object pose for optimal machine learning performance.This rich dataset empowers researchers and data scientists to leverage machine learning for various applications in the Indian dried fish industry.Overall, the Dried Fish Dataset for Indian Seafood aims to leverage machine learning to improve the standardization, quality control, safety, and efficiency of the Indian dried fish industry. ©

Value of the Data
• This dataset [ 1 ], the first open-access collection of its kind, showcases images of five prominent Indian dry fish species.Featuring 8290 high-quality images, this dataset captures a wide range of objects categorized into five main classes, with each class containing two subcategories.• AI models trained on this dataset will be able to automatically identify, sort, and grade various dried fish varieties according to their outward appearance.The dataset can be used to develop food recognition apps with the ability to identify dried fish dishes in images.This could be beneficial for recipe identification, dietary tracking, and automated food analysis in restaurants [2][3][4].• This dataset can help to improve online marketplaces for dried fish by enabling image-based search for specific types and providing visual information to buyers about the appearance and condition of the product.• Researchers can leverage the pre-trained model generated from the dry fish dataset as a starting point for their own tasks.By use of transfer learning approach, study can be a timesaving approach for tasks that require similar image recognition capabilities but might involve different objects or categories altogether.• This dataset serves as a valuable benchmark for evaluating and comparing the performance of various computer vision algorithms in dried fish recognition tasks.This allows researchers to evaluate and improve their own models.• This collection of images will serve as a valuable resource for researchers in food science, animal science, and various other disciplines.

Background
Drying fish is a traditional method to extend the shelf life of fish.By drying fish, the nutrients become concentrated, making it a rich source of protein and other minerals too.This is particularly valuable in regions where fresh fish isn't readily available or affordable throughout the year.Dry fish also contributes to the economic well-being of coastal communities in India.Detailed images can improve online marketplaces by enabling image-based search and providing buyers with clear visual information about the product.
There's a rising trend in applying AI and computer vision to various aspects of the food industry, including animal science, for processing, inspection, and quality control.A dry fish dataset contributes to this advancement.

Data Description
From agriculture to social sciences, image datasets are like building blocks for various fields.The "Dried Fish Dataset" aims to be a game-changer for machine learning by providing a vast and diverse collection of high-resolution images.This dataset helps develop programs that can automatically identify and classify dried fish.
To mimic real-world complexities, the dataset includes high-resolution images (1365 × 1024 pixels at 72 dpi) featuring clear and detailed visuals of dried fish.Encompassing a broad spectrum of backgrounds, lighting, and angles, the dataset features images with both single and multiple dried fish specimens.This diversity ensures model robustness in real-world scenarios.
The dataset captures the diversity of dried fish at a local market in Pune, India.It features over 8290 images across five common dried fish types: Prawns (Shrimp), Small Anchovi (Tingali), Golden Anchovi (Mandeli), Mackerel (Bangada), and Bombay Duck (Bombil).To showcase variations, each category includes both single fish and bulk quantity photos stored in separate folders.For a realistic representation, the images were captured with a consistent background but under various lighting conditions, both indoors and outdoors.Table 1 provides a detailed breakdown of image distribution across categories, and Fig. 1 visually depicts the dataset's folder structure for easy navigation.

Experimental design
Fig. 2 illustrates the image capture process.We used two high-resolution smartphone cameras: the iPhone 6 (Apple) and the Realme 6i (Realme).Initially, over 11,0 0 0 images were captured.To ensure high quality, each image was reviewed for object clarity, and blurry images were removed during preprocessing.This resulted in a final set of 8290 well-focused images representing fish types.For efficient access, the final images were meticulously organized into designated folders based on their classification.Dried fish were photographed under various conditions throughout September to November.This included capturing images with natural and artificial lighting, from different angles, and against same backgrounds.To ensure uniformity, a Python script was used during pre-processing to resize all images to a standard dimension of 1365 × 1024 pixels.
In essence, capturing images at the fish market formed the first step.These images then underwent a quality check and standardization process to become part of the final dataset.
Table 2 shows sample images of each category of specious in its single and bulk form.

Materials or specification of image acquisition system
This section details the camera equipment used for image capture and the resulting image specifications: To guarantee consistent image quality and compatibility across the dataset, all captured images were saved in JPEG format and resized to a standard resolution of 1365 × 1024 pixels.This standardization process ensures the dataset works seamlessly with various machine learning applications ( Fig. 3 ).

Method
To ensure high-quality and compatible images for our dataset, we applied a systematic preprocessing method.We used Batch Image Resizer, a popular tool known for its efficiency in batch image resizing.This allowed us to handle large image collections quickly, making it ideal for research involving image-based machine learning, image analysis, and data augmentation.After resizing, images are stored and number in sequence.
Our meticulous pre-processing goes beyond just preparing data for complex models.It sets the stage for smooth analysis throughout our research, ultimately strengthening the reliability and effectiveness of our methods.

Highlighting the dataset ʼs value
To assess the dataset's ability to enhance machine learning model performance in dried fish classification, we conducted experiments using established pre-trained models like InceptionV3, Xception, and MobileNetV2.These models were fine-tuned on our dataset, and their accuracy in classifying dried fish types was evaluated.Table 3 evaluates Pre-trained Model Performance on Dried Fish Classification Task.This table reports the pre-training and post-training accuracy scores obtained by different pre-trained models when fine-tuned on the dried fish image dataset for the classification task [5][6][7].
The dry fish dataset we created plays a key role in supercharging machine learning models like InceptionV3, Xception, EfficientNetB0, VGG16 and ResNet50.This dataset serves as a powerful training ground, allowing researchers to fine-tune these models for improved accuracy.By training AI models on this rich data, researchers can develop more reliable tools for automated sorting and grading of dried fish based solely on their appearance.This can significantly improve efficiency and consistency within the food processing industry.The dataset holds immense potential for creating food recognition applications capable of identifying dried fish dishes in images.These apps could find valuable use cases in recipe identification, dietary tracking tools, and automated food analysis in restaurants.

Limitations
The dried or dry fish dataset lacks all categories of dried fish.It covers most popular types in Pune, India region.

Table 1
Breakdown of image distribution of the dataset.

Table 2
Sample Images of each category.

Table 3
Comparison of ML models for pre-training and post-training on the dried fish dataset.